PROTEA, A Southern African Multicenter Congenital Heart Disease Registry and Biorepository: Rationale, Design, and Initial Results

Objectives: The PartneRships in cOngeniTal hEart disease (PROTEA) project aims to establish a densely phenotyped and genotyped Congenital Heart Disease (CHD) cohort for southern Africa. This will facilitate research into the epidemiology and genetic determinants of CHD in the region. This paper introduces the PROTEA project, characterizes its initial cohort, from the Western Cape Province of South Africa, and compares the proportion or “cohort-prevalences” of CHD-subtypes with international findings. Methods: PROTEA is a prospective multicenter CHD registry and biorepository. The initial cohort was recruited from seven hospitals in the Western Cape Province of South Africa from 1 April 2017 to 31 March 2019. All patients with structural CHD were eligible for inclusion. Descriptive data for the preliminary cohort are presented. In addition, cohort-prevalences (i.e., the proportion of patients within the cohort with a specific CHD-subtype) of 26 CHD-subtypes in PROTEA's pediatric cohort were compared with the cohort-prevalences of CHD-subtypes in two global birth-prevalence studies. Results: The study enrolled 1,473 participants over 2 years, median age was 1.9 (IQR 0.4–7.1) years. Predominant subtypes included ventricular septal defect (VSD) (339, 20%), atrial septal defect (ASD) (174, 11%), patent ductus arteriosus (185, 11%), atrioventricular septal defect (AVSD) (124, 7%), and tetralogy of Fallot (121, 7%). VSDs were 1.8 (95% CI, 1.6–2.0) times and ASDs 1.4 (95% CI, 1.2–1.6) times more common in global prevalence estimates than in PROTEA's pediatric cohort. AVSDs were 2.1 (95% CI, 1.7–2.5) times more common in PROTEA and pulmonary stenosis and double outlet right ventricle were also significantly more common compared to global estimates. Median maternal age at delivery was 28 (IQR 23–34) years. Eighty-two percent (347/425) of mothers used no pre-conception supplementation and 42% (105/250) used no first trimester supplements. Conclusions: The cohort-prevalence of certain mild CHD subtypes is lower than for international estimates and the cohort-prevalence of certain severe subtypes is higher. PROTEA is not a prevalence study, and these inconsistencies are unlikely the result of true differences in prevalence. However, these findings may indicate under-diagnosis of mild to moderate CHD and differences in CHD management and outcomes. This reemphasizes the need for robust CHD epidemiological research in the region.


INTRODUCTION
Congenital heart disease (CHD) is common, affecting 9 per 1,000 live births, and contributes significantly to the global burden of disease (1)(2)(3). In addition, CHD constitutes onethird of all congenital birth defects (4), a leading cause of childhood mortality (5). Thus, CHD is increasingly recognized as an important focus in the reduction of under-5 deaths and the realization of the United Nations' , 2016, Sustainable Development Goals (3,6,7).
Accurate and contemporary epidemiological research is an essential first step in this process, unfortunately, epidemiological data from Africa and low-income countries are lacking. Recent analyses of the global prevalence of CHD show significant geographic variation in reported prevalence rates, with African prevalence rates significantly lower than in other parts of the world (4,6,8). The study by van der Linde et al. (4) a metaanalysis of 114 papers from 1930 to 2010, found a global CHD birth prevalence of 9.1 per 1,000 live births. African data, however, indicated a birth prevalence of only 1.9 per 1,000 live births, significantly lower than all other regions. Similarly, the findings of Liu et al. (8), a recent global meta-analysis of 260 studies from 1970 to 2017, show that the reported birth prevalence of CHD in Africa was 2.3 per 1,000 live births, significantly lower than the global prevalence of 9.4 per 1,000 live births reported in the same study.
These results do not represent the true prevalence of CHD in Africa but rather reflect the extreme paucity of up-to-date research into CHD prevalence in the region (Figure 1). This premise is supported by the available literature, which documents the high burden of CHD (9)(10)(11)(12)(13). Rigorous, contemporary epidemiological data on sub-Saharan CHD are of immediate practical importance to inform and guide healthcare agencies and policymakers.
Similarly, the genetic architecture of the sub-Saharan CHD population is still largely understudied and the contribution of environmental factors unexplored (14). Region-specific genetic research is important for several reasons. Africa, as the birthplace of Homo sapiens, is home to genetically diverse populations. This genetic diversity renders them particularly powerful for identifying causative genetic variants (15). Despite this genetic diversity African populations are under-represented in DNA databases and there is a drive to correct this omission, importantly, through the development of Africanhosted DNA bio-repositories (16,17). More specifically, studies into the genetic architecture of cardiovascular disease have shown differences between Europe and sub-Saharan Africa (18), and the contribution of genetic syndromes and de novo mutations, in known CHD genes, is still unknown and may be clinically important.
The PROTEA (PartneRships in cOngeniTal hEart disease in Africa) project was created to determine the feasibility of maintaining a densely phenotyped and genotyped longitudinal CHD cohort in southern Africa. This cohort would facilitate future studies to address the lack of epidemiological and genetic data on CHD in southern Africa and help to develop clinical and cardiogenetic research infrastructure in the region.
The PROTEA project has four main aims. Aim 1 is to describe the phenotype and clinical management of CHD in southern Africa, following the implementation of a multicenter CHD registry and biorepository initially based in the Western Cape public cardiology service. Aim 2 is to investigate the genetic and molecular determinants of CHD in the region. Aim 3 is to study repaired tetralogy of Fallot and coarctation of the aorta using computational fluid dynamics, to demonstrate its potential to assist clinical assessment of CHD including longterm prediction of growth and remodeling from local blood flow (19). The growing pool of data from Aims 1, 2, and 3 will support development of the "digital twin" concept (20). Here, the combination of computational physics, artificial intelligence and machine learning will enable model-based patient-specific outcome assessment.
Finally, aim 4 is to build capacity for CHD research in southern Africa through the development of expertise and a sustainable research infrastructure. In addition, the PROTEA project will disseminate an integrated CHD electronic health record system (EHR) and research database. This is one of the key strengths of the project and distinguishes PROTEA from other registries. Many African centers have limited means to capture and store patient records electronically and the PROTEA application will greatly benefit their clinical practice. For example, PROTEA enables immediate access to medical reports, improved clinical audit processes, insight into mortalities and morbidities and related opportunities for learning. Additionally, PROTEA provides teaching opportunities via instructional clinical record forms and facilitates the opportunity for future research. This paper will introduce the PROTEA project and characterize its initial cohort from the Western Cape province of South Africa, enrolled over a 2-year period from 1 April 2017 to 31 March 2019. In addition, the "cohort-prevalences" (i.e., the proportion of patients within the cohort with a specific CHD-subtype) of CHD subtypes in PROTEA's pediatric-cohort is compared with CHD subtype cohort-prevalences as described in two recent global meta-analyses of CHD birth-prevalence (4,8).

Study Design
The PROTEA study is a prospective cohort of CHD in both children and adults which commenced in April 2017. The aim was to enroll 1,200 registry participants and collect 500 DNA repository samples over a 2-year period from April 1, 2017 to March 31, 2019. Enrolment is ongoing.

Setting and Population
Patients are recruited to Aim 1, the CHD registry, via convenience sampling primarily from three tertiary centers in the Western Cape Province of South Africa: Red Cross War Memorial Children's Hospital (RCWMCH), Tygerberg Hospital (TBH) and Groote Schuur Hospital (GSH) via the neonatal, pediatric, adult, and obstetric clinics and wards. Participants are also enrolled from the Mowbray Maternity Hospital, pediatric cardiology outreach clinics at George, Paarl, and Worcester Hospitals, and via engagement with CHD advocacy groups and CHD awareness events (Figure 2). Additionally, recruitment has begun at Windhoek Central Hospital, Namibia, however these participants are not included in this analysis. To minimize selection bias, recruitment to Aim 1 was systematic. All patients referred to the above-mentioned cardiology service were screened via folder review (for prevalent cases) and echocardiogram (for all incident and certain prevalent cases). All patients found to have structural CHD and fitting the inclusion and exclusion criteria were invited to participate in Aim 1. Aim 2 and 3 participants are selected from Aim 1 via convenience and purposive sampling, respectively. Additionally, a convenience sample of pediatric participants admitted to the RCWMCH cardiology ward were selected for interview

Inclusion and Exclusion Criteria
All patients with an echocardiogram-confirmed diagnosis of structural CHD are considered eligible for inclusion in the study. Participants with isolated conduction or functional abnormalities, patent foramen ovale, peripheral pulmonary stenosis or patent ductus arteriosus in premature infants were excluded.

Analysis
The proportion of CHD-subtypes in PROTEA's pediatric cohort was compared with the proportion of CHD subtypes in two global CHD birth-prevalence studies by van der Linde et al. (4) and Liu et al. (8). Twenty-six CHD-subtypes were selected for comparison. These subtypes were selected to match the ICD 9 and 10 subtype data presented in Liu et al. (8). Van der Linde et al. (4) only present data for the 8 most common CHD-subtypes in their analysis, all of which are included in the 26 subtypes above.
Cohort-prevalence ratios were calculated using R (version 4.0.0, R Foundation) (21) and the R-package epiR (version 1.0-14, Stevenson 2020) (22). Contingency tables were created for each CHD subtype and used to calculate prevalence ratios between PROTEA and both Liu et al. (8) and van der Linde et al. (4) independently. The 95% confidence intervals (CI) for the prevalence ratios were calculated using the Wald method, in addition p-values were generated using the chi-square test for independence, p < 0.05 were considered significant.

Aims 2 and 3
The methods and results of aims 2 and 3 are beyond the scope of this paper and will be presented in future articles (19).

Data Management and Security
All data are stored in the PROTEA application and database. The PROTEA application was developed using FileMaker (Claris International Inc., Santa Clara, CA) and integrates an EHR with a research database. Security features include encryption of data at rest, hierarchical access control and data encryption between client and server. Data integrity is ensured via intelligent prompting, audit logs recording all changes as well as incremental backups to geographically separated, redundant disk arrays.

Enrolment
Over the initial 2-year period, 1,473 patients were enrolled (1,346 pediatric; 109 adult and 18 from the combined cardio-obstetric clinic); 355 participants were added to the DNA repository with whole exome sequencing and copy number variant analysis completed on 120 samples each (Figure 2). Analysis of the resulting data is in progress.

CHD-Subtype Cohort-Prevalences
The prevalence of VSDs was significantly lower in the PROTEA pediatric cohort than in both Liu et al. (8)

DISCUSSION
This first report of the PROTEA cohort revealed proportions of CHD subtypes that were significantly different from global  Globally the prevalence of CHD is increasing, largely due to the increased availability and technical capability of echocardiography (8) which has resulted in increased diagnosis and reported prevalence of mild lesions like ASDs, PDAs, and VSDs. In fact, ASDs, PDAs and VSDs combined, accounted for 93.4% of the increased overall prevalence of CHD from 1970 to 2017, as reported in Liu et al. (8). The prevalence of severe CHD subtypes has remained relatively constant but with a decrease in prevalence of left ventricular outflow tract obstructions, conotruncal defects and AVSDs, likely the result of improved antenatal ultrasonography and the elective termination of affected pregnancies (TOP) (8,23). Globally these trends have resulted in an increased proportion of mild CHD subtypes and a reduction in the proportion of severe CHD subtypes.
South Africa's reported prevalence rates may not follow this trend (9). Despite being listed as an upper middle income nation by the World Bank, South Africa is a dual economy with a high degree of income inequality (24) and associated inequalities in health care access (25). There is no official South African, Department of Health policy regarding newborn screening for critical CHD (26) and neither cardiac examination nor chest auscultation are prescribed for well-child visits or in the management of sick children at primary health care centers (27)(28)(29)(30). As a result, it is likely that many South African children with mild CHD remain undiagnosed and this may be reflected by the lower proportions of VSDs, ASDs, AS, CoA, and PS seen in the PROTEA cohort.
Internationally, the proportion of severe CHD is decreasing, primarily due to the increase in mild subtypes but possibly also the result of increased antenatal detection of severe CHD and elective TOP (8)  . In contrast, the proportion of severe CHD subtypes in the PROTEA cohort remains high. This is likely a consequence of the lower proportion of mild CHD subtypes in the cohort but increased detection and referral rates for severe CHD-subtypes relative to mild and moderate subtypes may have contributed to this finding. AVSDs, PA and DORV are associated with early and severe symptoms which are less likely to be missed during routine 2 | CHD-subtype cohort-prevalences across 4 African studies (10-12) compared with the global birth prevalence meta-analysis by Liu et al. (8).

Study
Sulafa and Karani (12) Results are highlighted in bold and in pink where African cohort-prevalences are lower or in green where they are higher.
examination. AVSDs, in particular, are associated with trisomy-21 and infants with this well-recognized syndrome are routinely referred for full cardiac workup even when asymptomatic. In addition, poor adherence to antenatal prevention strategies, limited access to antenatal ultrasound and reduced antenatal diagnosis of severe CHD, in combination with physical, cultural and religious barriers that reduce access to TOP services (26,31) may have contributed to higher proportions of severe CHD in the PROTEA cohort. Without true prevalence data, inferences in this regard are speculative, however our findings show low antenatal detection rates (Adult 10%, Pediatric 19%) and low rates of antenatal folate supplementation which may support this hypothesis. Interestingly, PROTEA's findings are similar to other African, CHD cohorts and registries (10)(11)(12) which show lower proportions of VSDs (16-27%) and ASDs (6-12%) and higher proportions of AVSDs (6-9%), DORV (3%) and TOF (7-17%) consistently across all cohorts ( Table 2). These similarities may be the result of related sampling strategies and their inherent biases however we think it more likely that they reflect similarities in the health care landscape, including diagnosis & reporting rates, management, and early mortality rates.

LIMITATIONS
The PROTEA cohort is a convenience sample of patients with CHD presenting to the Western Cape CHD service, as such the external validity of the PROTEA cohort is at risk due to potential sampling bias. In addition, like all hospital-based registries, the true size of PROTEA's source population is technically unknown. This is due to factors such as, ill-defined referral areas and differences in availability and accessibility of health services within the source population. Accordingly, the data may not be generalizable and should not be used to make inferences about the true population prevalence of CHD or CHD subtypes in the region. However, one can use the proportion of CHD subtypes within the cohort, the "cohort-prevalence" to make comparisons with findings in other studies as, in this case the denominator, the total number of confirmed CHD cases, is known. Importantly, differences in "cohort-prevalences" of CHD-subtypes may result from differences in sampling strategy, inclusion and exclusion criteria or diagnosis classification. However, as we believe to be the case here, they may indicate differences in diagnosis and reporting rates, management, and outcomes in CHD-subtypes in the region and need investigating.

CONCLUSION
The comparison of PROTEA's pediatric CHD cohort with international prevalence studies shows interesting differences in the proportions of CHD-subtypes, and these differences warrant further investigation. The lower proportion of mild CHD may indicate missed diagnoses that untreated could lead to unnecessary morbidity and mortality. The higher proportion of severe subtypes is likely a consequence of the lower proportion of mild CHD-subtypes but increased detection and reporting rates, relative to mild subtypes, may contribute to this trend. Additionally, poor primary prevention, reduced antenatal detection and lower TOP rates may have resulted in South Africa not experiencing the same degree of reduction in prevalence of certain severe subtypes that has been seen internationally. Certainty in this regard, is essential to guide prevention strategies, antenatal and post-natal screening practices, and the allocation of resources in the management of CHD. Importantly, these findings highlight the urgent need for robust epidemiological research into CHD in the southern African region, including a thorough and accurate CHD birth prevalence study.

PATIENT AND PUBLIC INVOLVEMENT
The design and implementation of the PROTEA project was governed by a steering committee whose members include CHD patients, parents, and advocacy group leaders. The PROTEA research group hosts annual CHD awareness events for patients and families. The focus of these events is to educate on CHD, give feedback on current research and to discuss future research goals.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the University of Cape Town, Faculty of Health Sciences, Human Research Ethics Committee (R017-2014). Written informed consent to participate in this study was provided by the participants or by their legal guardian/next of kin, where appropriate.

ACKNOWLEDGMENTS
Prof. Bongani Mayosi was a co-investigator on this project until his death in 2018. We acknowledge his scientific input, his vision and support and his incredible legacy in growing and developing capacity in clinical scientists across the continent.
In addition, we thank Ms.